LLVM IR
   HOME

TheInfoList



OR:

LLVM is a set of
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
and
toolchain In software, a toolchain is a set of programming tools that is used to perform a complex software development task or to create a software product, which is typically another computer program or a set of related programs. In general, the tools form ...
technologies that can be used to develop a front end for any
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
and a back end for any
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
. LLVM is designed around a language-independent
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR) that serves as a
portable Portable may refer to: General * Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work * Portable classroom, a temporary building installed on the grounds of a school to provide a ...
, high-level
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
that can be optimized with a variety of transformations over multiple passes. LLVM is written in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
and is designed for
compile-time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
,
link-time In computing, a linker or link editor is a computer system program that takes one or more object files (generated by a compiler or an assembler) and combines them into a single executable file, library file, or another "object" file. A simpler ...
, run-time, and "idle-time" optimization. Originally implemented for C and C++, the language-agnostic design of LLVM has since spawned a wide variety of front ends: languages with compilers that use LLVM (or which do not directly use LLVM but can generate compiled programs as LLVM IR) include
ActionScript ActionScript is an object-oriented programming language originally developed by Macromedia Inc. (later acquired by Adobe). It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript (meaning i ...
,
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, ...
, C#,
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fro ...
,
PicoLisp PicoLisp is a programming language, a dialect of the language Lisp. It runs on operating systems including Linux and others that are ''Portable Operating System Interface'' (POSIX) compliant. Its most prominent features are simplicity and minimali ...
,
Crystal A crystal or crystalline solid is a solid material whose constituents (such as atoms, molecules, or ions) are arranged in a highly ordered microscopic structure, forming a crystal lattice that extends in all directions. In addition, macros ...
,
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
, D,
Delphi Delphi (; ), in legend previously called Pytho (Πυθώ), in ancient times was a sacred precinct that served as the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient classical world. The oracle ...
, Dylan,
Forth Forth or FORTH may refer to: Arts and entertainment * ''forth'' magazine, an Internet magazine * ''Forth'' (album), by The Verve, 2008 * ''Forth'', a 2011 album by Proto-Kaw * Radio Forth, a group of independent local radio stations in Scotla ...
, Fortran, Free Basic,
Free Pascal Free Pascal Compiler (FPC) is a compiler for the closely related programming-language dialects Pascal and Object Pascal. It is free software released under the GNU General Public License, witexception clausesthat allow static linking against it ...
, Graphical G,
Halide In chemistry, a halide (rarely halogenide) is a binary chemical compound, of which one part is a halogen atom and the other part is an element or radical that is less electronegative (or more electropositive) than the halogen, to make a fluor ...
,
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
,
Java bytecode In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, ...
,
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
, Kotlin,
Lua Lua or LUA may refer to: Science and technology * Lua (programming language) * Latvia University of Agriculture * Last universal ancestor, in evolution Ethnicity and language * Lua people, of Laos * Lawa people, of Thailand sometimes referred t ...
,
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
,
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
,
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
's SQL and PLpgSQL,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
,
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
, Scala,
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
, XC,
Xojo The Xojo programming environment and programming language is developed and commercially marketed by Xojo, Inc. of Austin, Texas for software development targeting macOS, Microsoft Windows, Linux, iOS, the Web and Raspberry Pi. Xojo uses a propri ...
and Zig.


History

The LLVM project started in 2000 at the
University of Illinois at Urbana–Champaign The University of Illinois Urbana-Champaign (U of I, Illinois, University of Illinois, or UIUC) is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the Universit ...
, under the direction of
Vikram Adve Vikram Adve (born 28 June 1966) is the Donald B. Gillies professor in the Department of Computer Science and a Professor in Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. Academia In 2020, Vikram Adve is ...
and
Chris Lattner Christopher Arthur Lattner (born 1978) is an American software engineer, former Google and Tesla employee and co-founder of LLVM, Clang compiler, MLIR compiler infrastructure and the Swift programming language. , he is the co-founder and CEO ...
. LLVM was originally developed as a research infrastructure to investigate
dynamic compilation Dynamic compilation is a process used by some programming language implementations to gain performance during program execution. Although the technique originated in Smalltalk,Peter L. Deutsch and Alan Schiffman. "Efficient Implementation of the S ...
techniques for static and
dynamic Dynamics (from Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' "power") or dynamic may refer to: Physics and engineering * Dynamics (mechanics) ** Aerodynamics, the study of the motion of air ** Analytical dynam ...
programming languages. LLVM was released under the
University of Illinois/NCSA Open Source License The University of Illinois/NCSA Open Source License, or UIUC license, is a permissive free software license, based on the MIT/X11 license and the 3-clause BSD license. By combining parts of these two licenses, it attempts to be clearer and more ...
, a
permissive free software licence A permissive software license, sometimes also called BSD-like or BSD-style license, is a free-software license which instead of copyleft protections, carries only minimal restrictions on how the software can be used, modified, and redistributed, ...
. In 2005,
Apple Inc. Apple Inc. is an American multinational technology company headquartered in Cupertino, California, United States. Apple is the largest technology company by revenue (totaling in 2021) and, as of June 2022, is the world's biggest company ...
hired Lattner and formed a team to work on the LLVM system for various uses within Apple's development systems. LLVM has been an integral part of Apple's
Xcode Xcode is Apple's integrated development environment (IDE) for macOS, used to develop software for macOS, iOS, iPadOS, watchOS, and tvOS. It was initially released in late 2003; the latest stable release is version 14.2, released on December 13, ...
development tools for
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
and
iOS iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also includes ...
since Xcode 4. In 2006, Lattner started working on a new project called
Clang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
. The combination of Clang front-end and LLVM back-end is called Clang/LLVM or simply Clang. The name ''LLVM'' was originally an
initialism An acronym is a word or name formed from the initial components of a longer name or phrase. Acronyms are usually formed from the initial letters of words, as in ''NATO'' (''North Atlantic Treaty Organization''), but sometimes use syllables, as ...
for ''Low Level Virtual Machine''. However, the LLVM project evolved into an umbrella project that has little relationship to what most current developers think of as a
virtual machine In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
. This made the initialism "confusing" and "inappropriate", and since 2011 LLVM is "officially no longer an acronym", but a brand that applies to the LLVM umbrella project. The project encompasses the LLVM
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR), the LLVM
debugger A debugger or debugging tool is a computer program used to software testing, test and debugging, debug other programs (the "target" program). The main use of a debugger is to run the target program under controlled conditions that permit the pr ...
, the LLVM implementation of the
C++ Standard Library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard.ISO/IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it wa ...
(with full support of
C++11 C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by ...
and
C++14 C++14 is a version of the ISO/IEC 14882 standard for the C++ programming language. It is intended to be a small extension over C++11, featuring mainly bug fixes and small improvements, and was replaced by C++17. Its approval was announced on August ...
), etc. LLVM is administered by the LLVM Foundation. Compiler engineer Tanya Lattner became its president in 2014, and was in post . ''"For designing and implementing LLVM"'', the
Association for Computing Machinery The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional member ...
presented Vikram Adve, Chris Lattner, and
Evan Cheng Evan is both an English and Welsh male given name derived from "Iefan", a Welsh form for the name John (name), John. In other languages it could be compared to "Ivan (name), Ivan", "Ian", and "Juan"; the name John itself is derived from the ancie ...
with the 2012
ACM Software System Award The ACM Software System Award is an annual award that honors people or an organization "for developing a software system that has had a lasting influence, reflected in contributions to concepts, in commercial acceptance, or both". It is awarded b ...
. The project was originally available under the
UIUC license The University of Illinois/NCSA Open Source License, or UIUC license, is a permissive free software license, based on the MIT/X11 license and the 3-clause BSD license. By combining parts of these two licenses, it attempts to be clearer and more ...
. After v9.0.0 released in 2019, LLVM relicensed to the Apache License 2.0 with LLVM Exceptions. about 400 contributions had not been relicensed.


Features

LLVM can provide the middle layers of a complete compiler system, taking
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR) code from a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
and emitting an optimized IR. This new IR can then be converted and linked into machine-dependent
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
code for a target platform. LLVM can accept the IR from the
GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
(GCC)
toolchain In software, a toolchain is a set of programming tools that is used to perform a complex software development task or to create a software product, which is typically another computer program or a set of related programs. In general, the tools form ...
, allowing it to be used with a wide array of existing compiler front-ends written for that project. LLVM can also generate relocatable machine code at compile-time or link-time or even binary machine code at run-time. LLVM supports a language-independent
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
and
type system In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
. Each instruction is in
static single assignment form In compiler design, static single assignment form (often abbreviated as SSA form or simply SSA) is a property of an intermediate representation (IR) that requires each variable to be assigned exactly once and defined before it is used. Existing var ...
(SSA), meaning that each
variable Variable may refer to: * Variable (computer science), a symbolic name associated with a value and whose associated value may be changed * Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...
(called a typed register) is assigned once and then frozen. This helps simplify the analysis of dependencies among variables. LLVM allows code to be compiled statically, as it is under the traditional GCC system, or left for late-compiling from the IR to machine code via
just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may cons ...
(JIT), similar to
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. The type system consists of basic types such as
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
or
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
numbers and five derived types:
pointers Pointer may refer to: Places * Pointer, Kentucky * Pointers, New Jersey * Pointers Airport, Wasco County, Oregon, United States * The Pointers, a pair of rocks off Antarctica People with the name * Pointer (surname), a surname (including a l ...
,
arrays An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
, vectors,
structures A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as ...
, and functions. A type construct in a concrete language can be represented by combining these basic types in LLVM. For example, a class in C++ can be represented by a mix of structures, functions and arrays of
function pointer A function pointer, also called a subroutine pointer or procedure pointer, is a pointer that points to a function. As opposed to referencing a data value, a function pointer points to executable code within memory. Dereferencing the function poin ...
s. The LLVM JIT compiler can optimize unneeded static branches out of a program at runtime, and thus is useful for
partial evaluation In computing, partial evaluation is a technique for several different types of program optimization by specialization. The most straightforward application is to produce new programs that run faster than the originals while being guaranteed to ...
in cases where a program has many options, most of which can easily be determined unneeded in a specific environment. This feature is used in the
OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
pipeline of
Mac OS X Leopard Mac OS X Leopard (version 10.5) is the sixth software versioning, major release of macOS, Apple Inc., Apple's desktop and server operating system for Macintosh computers. Leopard was released on October 26, 2007 as the successor of Mac OS X Tig ...
(v10.5) to provide support for missing hardware features. Graphics code within the OpenGL stack can be left in intermediate representation and then compiled when run on the target machine. On systems with high-end
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s (GPUs), the resulting code remains quite thin, passing the instructions on to the GPU with minimal changes. On systems with low-end GPUs, LLVM will compile optional procedures that run on the local
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU) that emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines using
Intel GMA The Intel Graphics Media Accelerator (GMA) is a series of integrated graphics processors introduced in 2004 by Intel, replacing the earlier Intel Extreme Graphics series and being succeeded by the Intel HD and Iris Graphics series. This series t ...
chipsets. A similar system was developed under the
Gallium3D Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers. Its most ...
LLVMpipe, and incorporated into the
GNOME A gnome is a mythological creature and diminutive spirit in Renaissance magic and alchemy, first introduced by Paracelsus in the 16th century and later adopted by more recent authors including those of modern fantasy literature. Its characte ...
shell to allow it to run without a proper 3D hardware driver loaded. For run-time performance of the compiled programs, GCC formerly outperformed LLVM by 10% on average in 2011. Newer results in 2013 indicate that LLVM has now caught up with GCC in this area, and is now compiling binaries of approximately equal performance.


Components

LLVM has become an umbrella project containing multiple components.


Front ends

LLVM was originally written to be a replacement for the existing
code generator In computing, Code generation denotes software techniques or systems that generate program code which may then be used independently of the generator system in a runtime environment. Specific articles: * Code generation (compiler), a mechanism to pr ...
in the GCC stack, and many of the GCC front ends have been modified to work with it, resulting in the now-defunct LLVM-GCC suite. The modifications generally involve a
GIMPLE The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
-to-LLVM IR step so that LLVM optimizers and codegen can be used instead of GCC's GIMPLE system. Apple was a significant user of LLVM-GCC through
Xcode Xcode is Apple's integrated development environment (IDE) for macOS, used to develop software for macOS, iOS, iPadOS, watchOS, and tvOS. It was initially released in late 2003; the latest stable release is version 14.2, released on December 13, ...
4.x (2013). This use of the GCC frontend was considered mostly a temporary measure, but with the advent of
Clang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
and advantages of LLVM and Clang's modern and modular codebase (as well as compilation speed), is mostly obsolete. LLVM currently supports compiling of
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, ...
, C,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, D,
Delphi Delphi (; ), in legend previously called Pytho (Πυθώ), in ancient times was a sacred precinct that served as the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient classical world. The oracle ...
, Fortran,
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
,
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
,
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
,
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
, and
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
using various front ends. Widespread interest in LLVM has led to several efforts to develop new front ends for a variety of languages. The one that has received the most attention is Clang, a new compiler supporting C, C++, and Objective-C. Primarily supported by Apple, Clang is aimed at replacing the C/Objective-C compiler in the GCC system with a system that is more easily integrated with
integrated development environment An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of at least a source code editor, build automation tools a ...
s (IDEs) and has wider support for multithreading. Support for
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...
directives has been included in
Clang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
since release 3.8. The Utrecht
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
compiler can generate code for LLVM. Though the generator is in the early stages of development, in many cases it has been more efficient than the C code generator. There is a
Glasgow Haskell Compiler The Glasgow Haskell Compiler (GHC) is an open-source native code compiler for the functional programming language Haskell. It provides a cross-platform environment for the writing and testing of Haskell code and it supports numerous extensions, ...
(GHC) backend using LLVM that achieves a 30% speed-up of the compiled code relative to native code compiling via GHC or C code generation followed by compiling, missing only one of the many optimizing techniques implemented by the GHC. Many other components are in various stages of development, including, but not limited to, the
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
compiler, a
Java bytecode In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, ...
front end, a
Common Intermediate Language Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. ...
(CIL) front end, the
MacRuby MacRuby is a discontinued implementation of the Ruby language that ran on the Objective-C runtime and CoreFoundation framework under development by Apple Inc. which "was supposed to replace RubyCocoa". It targeted Ruby 1.9 and used the high per ...
implementation of Ruby 1.9, various front ends for
Standard ML Standard ML (SML) is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of the ...
, and a new
graph coloring In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices o ...
register allocator.


Intermediate representation

The core of LLVM is the
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR), a low-level programming language similar to assembly. IR is a strongly typed
reduced instruction set computing In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comput ...
(RISC) instruction set which abstracts away most details of the target. For example, the calling convention is abstracted through ''call'' and ''ret'' instructions with explicit arguments. Also, instead of a fixed set of registers, IR uses an infinite set of temporaries of the form %0, %1, etc. LLVM supports three equivalent forms of IR: a human-readable assembly format, an in-memory format suitable for frontends, and a dense bitcode format for serializing. A simple
"Hello, world!" program A "Hello, World!" program is generally a computer program that ignores any input and outputs or displays a message similar to "Hello, World!". A small piece of code in most general-purpose programming languages, this program is used to illustra ...
in the IR format:For the full documentation, refer to . @.str = internal constant 4 x i8c"hello, world\0A\00" declare i32 @printf(ptr, ...) define i32 @main(i32 %argc, ptr %argv) nounwind The many different conventions used and features provided by different targets mean that LLVM cannot truly produce a target-independent IR and retarget it without breaking some established rules. Examples of target dependence beyond what is explicitly mentioned in the documentation can be found in a 2011 proposal for "wordcode", a fully target-independent variant of LLVM IR intended for online distribution. A more practical example is
PNaCl Google Native Client (NaCl) is a discontinued sandbox (computer security), sandboxing technology for running either a subset of Intel x86, ARM architecture, ARM, or MIPS architecture, MIPS native code, or a portable executable, in a sandbox. It ...
. The LLVM project also introduces another type of intermediate representation called MLIR which helps build reusable and extensible compiler infrastructure by employing a plugin architecture named Dialect. It enables the use of higher-level information on the program structure in the process of optimization including polyhedral compilation.


Back ends

At version 13, LLVM supports many
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s, including
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...
,
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
,
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
,
Qualcomm Hexagon Hexagon is the brand name for a family of digital signal processor (DSP) products by Qualcomm. Hexagon is also known as QDSP6, standing for “sixth generation digital signal processor.” According to Qualcomm, the Hexagon architecture is desig ...
, MIPS,
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
Parallel Thread Execution Parallel Thread Execution (PTX or NVPTX) is a low-level parallel thread execution virtual machine and instruction set architecture used in Nvidia's CUDA programming environment. The NVCC compiler translates code written in CUDA, a C++-like lang ...
(PTX; called ''NVPTX'' in LLVM documentation),
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
, AMD TeraScale, most
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
GPU recent ones (called ''AMDGPU'' in LLVM documentation),
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...
,
z/Architecture z/Architecture, initially and briefly called ESA Modal Extensions (ESAME), is IBM's 64-bit complex instruction set computer (CISC) instruction set architecture, implemented by its mainframe computers. IBM introduced its first z/Architecture-b ...
(called ''SystemZ'' in LLVM documentation), and
XCore XMOS is a fabless semiconductor company that develops audio products and multicore microcontrollers. Company history XMOS was founded in July 2005 by Ali Dixon, James Foster, Noel Hurley, David May, and Hitesh Mehta. It received seed funding ...
. Some features are not available on some platforms. Most features are present for IA-32, x86-64, z/Architecture, ARM, and PowerPC.
RISC-V RISC-V (pronounced "risk-five" where five refers to the number of generations of RISC architecture that were developed at the University of California, Berkeley since 1981) is an open standard instruction set architecture (ISA) based on estab ...
is supported as of version 7. In the past, LLVM also supported other backends, fully or partially, including C backend, Cell SPU, mblaze (MicroBlaze), AMD R600, DEC/Compaq
Alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...
(
Alpha AXP Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers ( ...
) and Nios2, but that hardware is mostly obsolete, and LLVM developers decided the support and maintenance costs were no longer justified. LLVM also supports
WebAssembly WebAssembly (sometimes abbreviated Wasm) defines a portable binary-code format and a corresponding text format for executable programs as well as software interfaces for facilitating interactions between such programs and their host environment ...
as a target, enabling compiled programs to execute in WebAssembly-enabled environments such as
Google Chrome Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS ...
/
Chromium Chromium is a chemical element with the symbol Cr and atomic number 24. It is the first element in group 6. It is a steely-grey, lustrous, hard, and brittle transition metal. Chromium metal is valued for its high corrosion resistance and hardne ...
,
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and ...
,
Microsoft Edge Microsoft Edge is a proprietary, cross-platform web browser created by Microsoft. It was first released in 2015 as part of Windows 10 and Xbox One and later ported to other platforms as a fork of Google's Chromium open-source project: Android ...
,
Apple Safari Safari is a web browser developed by Apple. It is built into macOS, iOS, and iPadOS, and uses Apple's open-source browser engine, WebKit, which was derived from KHTML. Safari was introduced in Mac OS X Panther in January 2003. It was inclu ...
or
WAVM WAVM (91.7 FM) is a high school radio station broadcasting from Maynard High School in Maynard, Massachusetts. Station programming provides the local area with news and church service broadcasts among other types of programming. Founded in 19 ...
. LLVM-compliant WebAssembly compilers typically support mostly unmodified source code written in C, C++, D, Rust, Nim, Kotlin and several other languages. The LLVM machine code (MC) subproject is LLVM's framework for translating machine instructions between textual forms and machine code. Formerly, LLVM relied on the system assembler, or one provided by a toolchain, to translate assembly into machine code. LLVM MC's integrated assembler supports most LLVM targets, including IA-32, x86-64, ARM, and ARM64. For some targets, including the various MIPS instruction sets, integrated assembly support is usable but still in the beta stage.


Linker

The lld subproject is an attempt to develop a built-in, platform-independent
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
for LLVM. lld aims to remove dependence on a third-party linker. , lld supports
ELF An elf () is a type of humanoid supernatural being in Germanic mythology and folklore. Elves appear especially in North Germanic mythology. They are subsequently mentioned in Snorri Sturluson's Icelandic Prose Edda. He distinguishes "ligh ...
, PE/COFF,
Mach-O Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically-loaded code, and core dumps. It was developed to replace the a.out format. Mach-O is used by some systems based on the M ...
, and
WebAssembly WebAssembly (sometimes abbreviated Wasm) defines a portable binary-code format and a corresponding text format for executable programs as well as software interfaces for facilitating interactions between such programs and their host environment ...
in descending order of completeness. lld is faster than both flavors of
GNU ld In computing, a linker or link editor is a computer System software, system program that takes one or more object files (generated by a compiler or an assembler (computing), assembler) and combines them into a single executable file, library (co ...
. Unlike the GNU linkers, lld has built-in support for
link-time optimization Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimiz ...
(LTO). This allows for faster code generation as it bypasses the use of a linker plugin, but on the other hand prohibits interoperability with other flavors of LTO.


C++ Standard Library

The LLVM project includes an implementation of the
C++ Standard Library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard.ISO/IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it wa ...
called libc++, dual-licensed under the
MIT License The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts only very limited restriction on reuse and has, therefore, high license comp ...
and the
UIUC license The University of Illinois/NCSA Open Source License, or UIUC license, is a permissive free software license, based on the MIT/X11 license and the 3-clause BSD license. By combining parts of these two licenses, it attempts to be clearer and more ...
. Since v9.0.0, it was relicensed to the Apache License 2.0 with LLVM Exceptions.


Polly

This implements a suite of cache-locality optimizations as well as auto-parallelism and vectorization using a polyhedral model.


Debugger


C Standard Library

llvm-libc is an incomplete, upcoming, ABI independent
C standard library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard.ISO/IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it wa ...
designed by and for the LLVM project.


Derivatives

Due to its permissive license, many vendors release their own tuned forks of LLVM. This is officially recognized by LLVM's documentation, which suggests against using version numbers in feature checks for this reason. Some of the vendors include: * AMD's AMD Optimizing C/C++ Compiler is based on LLVM, Clang, and Flang. * Apple maintains an open-source fork for
Xcode Xcode is Apple's integrated development environment (IDE) for macOS, used to develop software for macOS, iOS, iPadOS, watchOS, and tvOS. It was initially released in late 2003; the latest stable release is version 14.2, released on December 13, ...
. *
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
maintains a fork of LLVM 9 as the "Arm Compiler". *
Flang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
, Fortran project in development * IBM is adopting LLVM in its
C/C++ The C and C++ programming languages are closely related but have many significant differences. C++ began as a fork of an early, pre-standardized C, and was designed to be mostly source-and-link compatible with C compilers of the time. Due to thi ...
and Fortran compilers. *
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
has adopted LLVM for their next generation
Intel C++ Compiler Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems. Overview Intel o ...
. * The
Los Alamos National Laboratory Los Alamos National Laboratory (often shortened as Los Alamos and LANL) is one of the sixteen research and development laboratories of the United States Department of Energy (DOE), located a short distance northwest of Santa Fe, New Mexico, ...
has a parallel-computing fork of LLVM 8 called "Kitsune". *
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
uses LLVM in the implementation of its NVVM
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
Compiler. The NVVM compiler is distinct from the "NVPTX" backend mentioned in the Backends section, although both generate PTX code for Nvidia GPUs. * Since 2013, Sony has been using LLVM's primary front-end Clang compiler in the
software development kit A software development kit (SDK) is a collection of software development tools in one installable package. They facilitate the creation of applications by having a compiler, debugger and sometimes a software framework. They are normally specific to ...
(SDK) of its
PlayStation 4 The PlayStation 4 (PS4) is a home video game console developed by Sony Interactive Entertainment. Announced as the successor to the PlayStation 3 in February 2013, it was launched on November 15, 2013, in North America, November 29, 2013 in ...
console.


See also

*
Common Intermediate Language Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. ...
*
HHVM HipHop Virtual Machine (HHVM) is an open-source virtual machine based on just-in-time (JIT) compilation that serves as an execution engine for the Hack programming language and used to support PHP execution before the release of HHVM version 4 ...
*
C-- C-- (pronounced ''C minus minus'') is a C (programming language), C-like programming language. Its creators, functional programming researchers Simon Peyton Jones and Norman Ramsey, designed it to be generated mainly by compilers for very high-l ...
*
Amsterdam Compiler Kit The Amsterdam Compiler Kit (ACK) is a retargetable compiler suite and toolchain written by Andrew Tanenbaum and Ceriel Jacobs, since 2005 maintained by David Given. It has frontends for the following programming languages: C, Pascal, Modula-2 ...
(ACK) * LLDB (debugger) *
GNU lightning GNU lightning is a free-software library for generating assembly language code at run-time. Version 2.1.3, released in September 2019, supports backends for SPARC (32-bit), x86 (32- and 64-bit), MIPS, ARM (32- and 64-bit), ia64, HPPA, PowerPC ...
*
GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
(GCC) *
Pure Pure may refer to: Computing * A pure function * A pure virtual function * PureSystems, a family of computer systems introduced by IBM in 2012 * Pure Software, a company founded in 1991 by Reed Hastings to support the Purify tool * Pure-FTPd, F ...
*
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
*
ROCm ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous ...
*
Emscripten Emscripten is an LLVM/Clang-based compiler that compiles C and C++ source code to WebAssembly (or to a subset of JavaScript known as asm.js, its original compilation target before the advent of WebAssembly in 2017), primarily for execution in we ...
*
TenDRA Distribution Format The abstract machine TDF (originally the Ten15 Distribution Format, but more recently redefined as the TenDRA Distribution Format) evolved at the Royal Signals and Radar Establishment in the UK as a successor to Ten15. Its design allowed support ...
*
Architecture Neutral Distribution Format The Architecture Neutral Distribution Format (ANDF) in computing is a technology allowing common "shrink wrapped" binary application programs to be distributed for use on conformant Unix systems, translated to run on different underlying hardware ...
(ANDF) * Comparison of application virtualization software *
SPIR-V Standard Portable Intermediate Representation (SPIR) is an intermediate language for parallel compute and graphics by Khronos Group. It is used in multiple execution environments, including the Vulkan graphics API and the OpenCL compute API, to re ...
* University of Illinois at Urbana Champaign discoveries & innovations


Literature

*
Chris Lattner Christopher Arthur Lattner (born 1978) is an American software engineer, former Google and Tesla employee and co-founder of LLVM, Clang compiler, MLIR compiler infrastructure and the Swift programming language. , he is the co-founder and CEO ...
-
The Architecture of Open Source Applications - Chapter 11 LLVM
', , released 2012 under
CC BY A Creative Commons (CC) license is one of several public copyright licenses that enable the free distribution of an otherwise copyrighted "work".A "work" is any creative material made by a person. A painting, a graphic, a book, a song/lyrics ...
3.0 (
Open Access Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
).
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
a published paper by Chris Lattner, Vikram Adve


References


External links

* {{Use mdy dates, date=October 2018 Compilers Free compilers and interpreters Register-based virtual machines Software using the NCSA license Software using the Apache license